5.5 Indexing

Part of doing interesting things with data is being able to select just the data that you need for a particular circumstance. You’ve already seen how to get a particular element from a vector or matrix, or a specific component from a list, using indices. A datum’s index is its position in the vector or list. For example, to get the second element of a vector A, we use index 2 in square brackets: A[2]. The process of selecting elements using their indices is called indexing, and R provides multiple ways of indexing vectors. Below we’ll cover some basic indexing and more advanced indexing for the different data structures in R.

5.5.1 Vectors

Let’s define a vector and access an element in the way you already know:

# create an example vector
V <- c("A", "B", "C", "D", "E", "F", "G", "H", "I")

# access the 5th element
V[5]
[1] "E"

Unlike many other languages, R indices start with 1, not 0! so the first element is accessed as A[1], etc.

Here are some other ways you can index as well. You can access multiple indices at the same time using a numric vector of indices:

V[c(1, 2, 5)]  # access elements 1, 2, and 5
[1] "A" "B" "E"

If you need to access several indices in a row, you can use a colon (:):

V[2:7]  # access elements 2 through 7
[1] "B" "C" "D" "E" "F" "G"

You can even combine these two methods:

V[c(1:3, 6)]  # access elements 1, 2, 3, and 6
[1] "A" "B" "C" "F"

Note that the following are all equivalent ways to access the first three elements of V:

  • V[1:3]
  • V[c(1,2,3)]
  • V[c(1:3)]
  • V[c(1:2,3)]
  • can you think of another example?

But the first way would probably be the most clear for someone else to understand. All of these methods can work with assignment as well:

V[c(1, 7:9)] <- "X"  # change elements 1, 7, 8, and 9 to "X"
V
[1] "X" "B" "C" "D" "E" "F" "X" "X" "X"

Even though these examples use a character vector, this indexing works on vectors of any type.

5.5.2 Matrices

To access an element of a matrix, we have to specify the row and the column. Let’s create a matrix from the V vector and access one of its elements:

M <- matrix(V, 3, 3)  # create matrix M with data from vector V
M
     [,1] [,2] [,3]
[1,] "X"  "D"  "X" 
[2,] "B"  "E"  "X" 
[3,] "C"  "F"  "X" 
M[1,2]  # access the element in row 1, column 2
[1] "D"

Recall that we can access an entire row or column by leaving the other index blank:

M[1,]  # access the entire first row
[1] "X" "D" "X"
M[,2]  # access the entire second column
[1] "D" "E" "F"

But any of the indexing we just used for vectors can also be used on matrices

M[1:2, c(2, 3)]  # access the elements in rows 1 and 2, columns 2 and 3.
     [,1] [,2]
[1,] "D"  "X" 
[2,] "E"  "X" 

Finally, there is one more way of indexing Matrices (for now), that provides only one index:

M[5]  # access the "5th" element of the matrix
[1] "E"

If you give one index, then R will count down the first row, then the second, then the third, etc., until it reaches the index you specified. Notice how this agrees with the 5th element of the matrix V, which was used to make our matrix! And finally, as before, any of these indexing methods can be used to change an element’s value:

M[2, 1:3] <- "Hats"
M
     [,1]   [,2]   [,3]  
[1,] "X"    "D"    "X"   
[2,] "Hats" "Hats" "Hats"
[3,] "C"    "F"    "X"   

5.5.3 Lists

So far we’ve discussed three different ways of accessing elements in a list:

L <- list(A = "apple", b = "banana", c="cherry")

L[[1]]  # access using index number
[1] "apple"
L[["b"]]  # access using component name
[1] "banana"
L$c  # access using component name and dollar sign notation
[1] "cherry"

And these are basically the only option. Unfortunately, you cannot pass lists a vector of indices in order to access multiple components at once:

L[[1:3]]  # this does not work
Error in L[[1:3]]: recursive indexing failed at level 2
What L[[1:3]] actually does (as the error message might suggest), is access elements within a nested list, but that is beyond the scope of this class.

Create a vector containing the numbers 1 through 1000 in order (hint: try using 1:1000 on the right of the assignment operator). Then, change elements 4, 196, and 501 through 556 to “brussels sprouts”. What happened to the other elements in the vector?

5.5.4 Data Frames

Remember that data frames are just lists of vectors, so the same indexing rules of lists and vectors apply. But remember that matrix indexing rules also apply! Here are some examples with the Olympic athletes data.

athletes3 <- athletes[1:20,1:5]  # get the first 20 rows and first 5 columns, and assign it to athletes3
athletes3$Name  # get the Name column
 [1] "A Dijiang"                "A Lamusi"                
 [3] "Gunnar Nielsen Aaby"      "Edgar Lindenau Aabye"    
 [5] "Christine Jacoba Aaftink" "Christine Jacoba Aaftink"
 [7] "Christine Jacoba Aaftink" "Christine Jacoba Aaftink"
 [9] "Christine Jacoba Aaftink" "Christine Jacoba Aaftink"
[11] "Per Knut Aaland"          "Per Knut Aaland"         
[13] "Per Knut Aaland"          "Per Knut Aaland"         
[15] "Per Knut Aaland"          "Per Knut Aaland"         
[17] "Per Knut Aaland"          "Per Knut Aaland"         
[19] "John Aalberg"             "John Aalberg"            

Remember that each column of a data frame is just a vector, so we can use list indexing to access the Name column, then immediately use vector indexing to get only the indices that we want:

athletes3$Name[1:3]  # get the first three elements of the Name column
[1] "A Dijiang"           "A Lamusi"            "Gunnar Nielsen Aaby"

Notice how With lists, you cannot access multiple components (which is what dataframe columns are) at the same time, but with matrices you can access multiple columns at once. Since data frames can use matrix formatting, you can select multiple columns at once, as the first example above showed.

You can also access

athletes3[,c("Name", "Sex")]  # Access columns 1:3 like 

Using the mtcars data frame (included in R), get the mpg for the cars in rows 15 through 20, and assign it to a vector. Now find the average mpg of those cars.

Think it’s weird that data frames can be indexed like matrices? It gets weirder. When vectors have names, they can be indexed like lists! Try for yourself: create a vector a <- c(1, 2, 3) and set the names with names(a) <- c("angus", "brillow", "chandelier") , then see what happens if you type a[["angus"]]! Matrices can also be accessed using names as well.

5.5.5 Advanced indexing

There are even more ways to select the data you need from your R data structures, let’s look some more advanced techniques.

5.5.5.1 Logical based indexing

One very useful method that R provides is to access elements of a vector using a different, logical vector of the same length. As the following example will show, R will give only the elements which are true in the logical vector:

v <- c("alpha", "bravo", "charlie", "delta")  # the vector we want to access
i <- c(FALSE, TRUE, FALSE, TRUE)  # the logical vector we'll use to index.

# index v using i:
v[i]
[1] "bravo" "delta"

Why is this so useful? Remember that you can create logical vectors by comparing any type of vector to some value:

v == "delta"
[1] FALSE FALSE FALSE  TRUE

This means you can create a logical vector in order to extract only the elements of a vector which match some criterion. For example, let’s create a logical vector based on whether an Olympic athlete’s sport was “Tug-Of-War”.

plays_tug_of_war <- athletes$Sport == "Tug-Of-War"  # create logical vector

sum(plays_tug_of_war)  # count how many TRUE's
[1] 170

Now let’s use that logical vector to get the names of the athletes:

athletes$Name[plays_tug_of_war]
  [1] "Edgar Lindenau Aabye"                  
  [2] "Albrekt Persson Almqvist"              
  [3] "Arvid Leander Andersson"               
  [4] "Rudolf Arnold"                         
  [5] "Adriano Arnoldo"                       
  [6] "Edward \"Ned\" Barrett"                
  [7] "Denis Raymond Basset"                  
  [8] "Henri Baur"                            
  [9] "Wilhelmus Johannes \"Wim\" Bekkers"    
 [10] "Adolf Bergman"                         
 [11] "Wilhelm Heinrich Born"                 
 [12] "Edouard Bourguignon"                   
 [13] "Max Braun"                             
 [14] "Carleton Lyman \"Carl\" Brosius"       
 [15] "Wilbur Gordon Burroughs, Sr."          
 [16] "Thomas \"Tom\" Butler"                 
 [17] "Silvio Calzolari"                      
 [18] "George Walter Canning"                 
 [19] "Romolo Luigi Tullio Carpi"             
 [20] "Charles Chadwick"                      
 [21] "Walter Chaffe"                         
 [22] "Walter Chaffe"                         
 [23] "James Michael \"Jim\" Clarke"          
 [24] "William Wesley Coe, Jr."               
 [25] "Jean Collas"                           
 [26] "Arthur Kent Dearborn"                  
 [27] "Charles J. Dieges"                     
 [28] "Dimitrios Dimitrakopoulos"             
 [29] "Wilhelm \"Willy\" Drr"                 
 [30] "Joseph Dowler"                         
 [31] "Joseph Dowler"                         
 [32] "Alphonse Ducatillon"                   
 [33] "Ernest Walter Ebbage"                  
 [34] "Johan Viktor Edman"                    
 [35] "Frans Oskar Fast"                      
 [36] "Lawrence Edward Joseph Feuerbach"      
 [37] "Stephen Calvin Fields"                 
 [38] "John Joseph Flanagan"                  
 [39] "Patrick \"Pat\" Flanagan"              
 [40] "Giovanni Roberto Forni"                
 [41] "Erik Algot Fredriksson"                
 [42] "Oscar Charles Friede"                  
 [43] "Nikolaos P. Georgantas"                
 [44] "Anastasios Georgopoulos"               
 [45] "Charles Marius Duma Adolphe Gondouin"  
 [46] "Frederick William Goodfellow"          
 [47] "Erik Gustaf Granfelt"                  
 [48] "Sylvester Granrose (-Grnros)"          
 [49] "William Greggan"                       
 [50] "Andreas Gustaf Grnberger"              
 [51] "Johan Gustaf Anton Gustafsson"         
 [52] "Per August Gustafsson"                 
 [53] "Charles Haberkorn"                     
 [54] "Johannes Hendrikus \"Jan\" Hengeveld"  
 [55] "Francis Henriquez de Zubira"           
 [56] "Pieter Hillense"                       
 [57] "William Hirons"                        
 [58] "Torsten Oswald Magnus Holmberg"        
 [59] "Frederick William \"Fred\" Holmes"     
 [60] "Karl Hltl"                             
 [61] "Thomas Homewood"                       
 [62] "Marquis Franklin \"Bill\" Horr"        
 [63] "Frederick Harkness Humphreys"          
 [64] "Frederick Harkness Humphreys"          
 [65] "Frederick Harkness Humphreys"          
 [66] "Mathias \"Matt\" Hynes"                
 [67] "Albert Ireton"                         
 [68] "Harry Jacobs"                          
 [69] "Sijtse Jansma"                         
 [70] "Hendrikus Alexander \"Henk\" Janssen"  
 [71] "Carl Emil \"Carl-Emil\" Johansson"     
 [72] "Oskar Emil Johansson"                  
 [73] "Knut Richard Johansson"                
 [74] "Sidney B. \"Sid\" Johnson"             
 [75] "Samuel Symington \"Sam\" Jones"        
 [76] "Carl Jonsson"                          
 [77] "Periklis Kakousis"                     
 [78] "Carl Kaltenbach"                       
 [79] "Lloyd Albert Kelsey"                   
 [80] "Alexander Kidd"                        
 [81] "Josef Krmer"                           
 [82] "Karl Erik Krook"                       
 [83] "Joseph Kszyczewski"                    
 [84] "Frank X. Kugler"                       
 [85] "Leopold Lahner"                        
 [86] "Erik Victor Larsson"                   
 [87] "Konstantinos Lazaros"                  
 [88] "Spyridon \"Spyros\" Lazaros"           
 [89] "Eric Otto Valdemar Lemming"            
 [90] "Rudolf Lindmayer"                      
 [91] "Carl Herbert Lindstrm"                 
 [92] "Pieter Lombard"                        
 [93] "Daniel McDonald Lowey"                 
 [94] "Raymond Rmy Maertens"                  
 [95] "Conrad Emanuel Magnusson (Magnusen-)"  
 [96] "Matthew John \"Matt\" McGrath"         
 [97] "Frederick Harris Merriman"             
 [98] "Vasilios Metalos"                      
 [99] "Edwin Archer Mills"                    
[100] "Edwin Archer Mills"                    
[101] "Edwin Archer Mills"                    
[102] "James Sarsfield \"Jim\" Mitchel"       
[103] "Alexander Munro"                       
[104] "Alexander Munro"                       
[105] "August Nilsson"                        
[106] "Karl-Gottfrid Nilsson"                 
[107] "Karl Axel Patrik Norling"              
[108] "Oscar G. Olson"                        
[109] "Georgios Papakhristou"                 
[110] "William Penn"                          
[111] "Patrick \"Paddy\" Philbin"             
[112] "Christian Albert Piek"                 
[113] "Henri Pintens"                         
[114] "Georgios Psakhos"                      
[115] "Vasilios Psakhos"                      
[116] "Rodolfo Rambozzi"                      
[117] "Wilhelm Ritzenhoff"                    
[118] "August Henry \"Gus\" Rodenberg"        
[119] "Louis Joseph Roffo"                    
[120] "Joseph Albertus Rond"                  
[121] "Heinrich Rondi"                        
[122] "Charles \"Chuck\" Rose"                
[123] "Ralph Waldo Rose"                      
[124] "mile Pierre Sarrade"                   
[125] "Carlo Schiappapietra"                  
[126] "Eugen Stahl Schmidt"                   
[127] "Heinrich Schneidereit"                 
[128] "Johannes Schutte"                      
[129] "Henry Seiling"                         
[130] "William Bernard Seiling"               
[131] "John Sewell"                           
[132] "John Sewell"                           
[133] "John James Shepherd"                   
[134] "John James Shepherd"                   
[135] "John James Shepherd"                   
[136] "Willie Slade"                          
[137] "George Smith"                          
[138] "Gustaf Fredrik Sderstrm"               
[139] "Franz Solar"                           
[140] "Karl Gustaf Vilhelm Staaf (Johansson-)"
[141] "Josef Steinbach"                       
[142] "Harold Joseph \"Harry\" Stiff"         
[143] "Carl Gustaf Gunnar Svensson"           
[144] "Thomas Swindlehurst"                   
[145] "Leander James \"Lee\" Talbott, Jr."    
[146] "William Baldry \"Walter\" Tammas"      
[147] "Charles Henry Robert Thias"            
[148] "Ernest Arthur Thorne"                  
[149] "Giuseppe Tonani"                       
[150] "Panagiotis Trivoulidis"                
[151] "Antonios Tsitas"                       
[152] "Orin Thomas Upshaw"                    
[153] "Charles Van Den Broeck"                
[154] "Franois Van Hoorenbeek"                
[155] "Antonius \"Anton\" van Loon"           
[156] "Willem van Loon"                       
[157] "Marinus Cornelis \"Rinus\" van Rekum"  
[158] "Willem \"Wim\" van Rekum"              
[159] "Spyridon \"Spyros\" Vellas"            
[160] "Paulus Visser"                         
[161] "Bruno Julius Wagner"                   
[162] "Christopher Walker"                    
[163] "Claes Ture Wersll"                     
[164] "Charles Gustav Wilhelm Winckler"       
[165] "Joseph Winston"                        
[166] "Josef Wittmann"                        
[167] "Anders Hjalmar Wollgarth"              
[168] "James Henry Woodget"                   
[169] "Gustave Marius L. Wuyts"               
[170] "Amedeo Zotti"                          

Using the Olympic athletes data, create a logical vector which is true when an athlete’s sport is wrestling. Then access the age of all wrestlers, and assign the ages to another vector. Finally, compute the average age of the wrestlers vector (remember, we there may be duplicate athlete names, so this average won’t mean much; the emphasis is on indexing right now)

Logical vectors can also be used to subset a data frame based on some condition. That is, we take entire rows which meet a condition, rather than just a single variable. For example:

# Subset the athletes data frame to get only Summer athletes.
athletes_summer <- athletes[athletes$Season == "Summer",]
In the last example, we are creating the logical vector and immediately using it to index the rows. Pause and think through what’s happening in this example if it’s not quite clear. Also note the placement of the comma (,), which indicates that we’re indexing rows, not columns, of the data frame.

Another common use for Logical indexing is filling in missing values. As part of data cleaning, you may decide to change NA’s to some other value. This is easy since we can create a logical vector which is TRUE when a value is NA. We can do this with the is.na function:

# For athletes with no medal, replace `NA` with "No Medal"
athletes$Medal[is.na(athletes$Medal)] <- "No Medal"

This is not an endorsement of a particular approach to handling missing values. There are many situation dependent considerations that should be made in order to decide the best thing to do.

5.5.5.2 Negative Indexing

Sometimes it’s easier to specify which columns or rows should be excluded from indexing, rather than those that should be included. To select every column except the first one, you can use a negative index:

athletes[,-1]  # leave out the ID column

This also works with numeric vectors:

athletes[-c(1:10),]  # access all but the first 10 rows.

5.5.5.3 nested indexing: [[1]][3]

In R, it’s likely that at some point you will encounter nested data structures, such as vectors within lists (data frames!) and lists within lists. This might make indexing more confusing at first, but a little practice will help you keep things organized in your mind. Consider the following example:

# create a vector and a matrix
V <- 1:16
M <- matrix(V, 4, 4)

# create a list which contains them:
L <- list(V, M)

# create a character vector
C <- c("I", "think", "therefore", "I", "am")

# create another list which will contain L and C:
L2 <- list(L, C)

With lists like this, it’s easy to see code like L2[[1]][[2]][2,3] and get confused about what is happening. It’s best to break down the statement from left to right

L2                 # the second list, L2
L2[[1]]            # the first component of L2, which is the first list, L
L2[[1]][[2]]       # the second element of L, which is the matrix M
L2[[1]][[2]][2,3]  # the second row and third column of M.

We have discussed quite a few ways to index data, but rest assured there are more ways that we did not discuss! We won’t address them now, to keep things simple!